Skip to content

Conversation

@bwbarrett
Copy link
Member

When the ob1 PML was not eligible for selection (such as when the user sets --mca pml cm), the BML and BTL frameworks are not initialized and the rdma osc component will later fail as there are no BTLs available. This patch resolves the issue by having the rdma osc component initialize the BML interface.

Making this change required two additional, related changes. First, since the BTLs use the modex, the rdma initialization must be moved before the modex point, so that putting data in the modex works as expected. Second, BTLs can require loading the entire world during init (such as TCP when there are multiple threads and multiple NICs or usnic), so we extend the world loading checks to include OSC.

Since the other Portals4 components say that they do require world loading, we also assume the Portals4 osc component also requires world loading.

(cherry picked from commit 4215325)

When the ob1 PML was not eligible for selection (such as when the user
sets --mca pml cm), the BML and BTL frameworks are not initialized and
the rdma osc component will later fail as there are no BTLs available.
This patch resolves the issue by having the rdma osc component
initialize the BML interface.

Making this change required two additional, related changes.  First,
since the BTLs use the modex, the rdma initialization must be moved
before the modex point, so that putting data in the modex works as
expected.  Second, BTLs can require loading the entire world during
init (such as TCP when there are multiple threads and multiple NICs or
usnic), so we extend the world loading checks to include OSC.

Since the other Portals4 components say that they do require world
loading, we also assume the Portals4 osc component also requires
world loading.

Signed-off-by: Brian Barrett <[email protected]>
(cherry picked from commit 4215325)
@bwbarrett bwbarrett requested a review from hppritcha July 1, 2025 18:22
@github-actions github-actions bot added this to the v5.0.8 milestone Jul 1, 2025
@janjust janjust merged commit a16908d into open-mpi:v5.0.x Jul 1, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants